This is slightly old, and eventually needs re-writing.
For the moment, we shall presume we are only comparing grid based predictions (though with an eye to possibly very small grid cells arising from continuous predictions).
We first discuss a very high-level overview of a "pipeline" for producing predictions, scoring them, and comparing different predictions. We then provide more details on each stage.
(Not in alphabetical order, sorry, so that adding a new article doesn't require changing all the numbers above :smile:)
In [ ]: